54 research outputs found
Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
Kernel mean embeddings have recently attracted the attention of the machine
learning community. They map measures from some set to functions in a
reproducing kernel Hilbert space (RKHS) with kernel . The RKHS distance of
two mapped measures is a semi-metric over . We study three questions.
(I) For a given kernel, what sets can be embedded? (II) When is the
embedding injective over (in which case is a metric)? (III) How does
the -induced topology compare to other topologies on ? The existing
machine learning literature has addressed these questions in cases where is
(a subset of) the finite regular Borel measures. We unify, improve and
generalise those results. Our approach naturally leads to continuous and
possibly even injective embeddings of (Schwartz-) distributions, i.e.,
generalised measures, but the reader is free to focus on measures only. In
particular, we systemise and extend various (partly known) equivalences between
different notions of universal, characteristic and strictly positive definite
kernels, and show that on an underlying locally compact Hausdorff space,
metrises the weak convergence of probability measures if and only if is
continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published
2018). Please start with the JMLR version. 55 pages (33 pages main text, 22
pages appendix), 2 tables, 1 figure (in appendix
Distribution-Dissimilarities in Machine Learning
Any binary classifier (or score-function) can be used to define a dissimilarity
between two distributions. Many well-known distribution-dissimilarities are
actually classifier-based: total variation, KL- or JS-divergence, Hellinger
distance, etc. And many recent popular generative modeling algorithms compute
or approximate these distribution-dissimilarities by explicitly training a
classifier: e.g. generative adversarial networks (GAN) and their variants.
This thesis introduces and studies such classifier-based
distribution-dissimilarities. After a general introduction, the first part
analyzes the influence of the classifiers' capacity on the dissimilarity's
strength for the special case of maximum mean discrepancies (MMD) and provides
applications. The second part studies applications of classifier-based
distribution-dissimilarities in the context of generative modeling and presents
two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and
final part focuses on adversarial examples, i.e. targeted but imperceptible
input-perturbations that lead to drastically different predictions of an
artificial classifier. It shows that adversarial vulnerability of neural network
based classifiers typically increases with the input-dimension, independently
of the network topology
PopSkipJump: Decision-Based Attack for Probabilistic Classifiers
Most current classifiers are vulnerable to adversarial examples, small input
perturbations that change the classification output. Many existing attack
algorithms cover various settings, from white-box to black-box classifiers, but
typically assume that the answers are deterministic and often fail when they
are not. We therefore propose a new adversarial decision-based attack
specifically designed for classifiers with probabilistic outputs. It is based
on the HopSkipJump attack by Chen et al. (2019, arXiv:1904.02144v5 ), a strong
and query efficient decision-based attack originally designed for deterministic
classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries
to maintain HopSkipJump's original output quality across various noise levels,
while converging to its query efficiency as the noise level decreases. We test
our attack on various noise models, including state-of-the-art off-the-shelf
randomized defenses, and show that they offer almost no extra robustness to
decision-based attacks. Code is available at
https://github.com/cjsg/PopSkipJump .Comment: ICML'21. Code available at https://github.com/cjsg/PopSkipJump . 9
pages & 7 figures in main part, 14 pages & 10 figures in appendi
Removing systematic errors for exoplanet search via latent causes
We describe a method for removing the effect of confounders in order to
reconstruct a latent quantity of interest. The method, referred to as
half-sibling regression, is inspired by recent work in causal inference using
additive noise models. We provide a theoretical justification and illustrate
the potential of the method in a challenging astronomy application.Comment: Extended version of a paper appearing in the Proceedings of the 32nd
International Conference on Machine Learning, Lille, France, 201
Assaying Out-Of-Distribution Generalization in Transfer Learning
Since out-of-distribution generalization is a generally ill-posed problem,
various proxy targets (e.g., calibration, adversarial robustness, algorithmic
corruptions, invariance across shifts) were studied across different research
programs resulting in different recommendations. While sharing the same
aspirational goal, these approaches have never been tested under the same
experimental conditions on real data. In this paper, we take a unified view of
previous work, highlighting message discrepancies that we address empirically,
and providing recommendations on how to measure the robustness of a model and
how to improve it. To this end, we collect 172 publicly available dataset pairs
for training and out-of-distribution evaluation of accuracy, calibration error,
adversarial attacks, environment invariance, and synthetic corruptions. We
fine-tune over 31k networks, from nine different architectures in the many- and
few-shot setting. Our findings confirm that in- and out-of-distribution
accuracies tend to increase jointly, but show that their relation is largely
dataset-dependent, and in general more nuanced and more complex than posited by
previous, smaller scale studies
Object-Centric Multiple Object Tracking
Unsupervised object-centric learning methods allow the partitioning of scenes
into entities without additional localization information and are excellent
candidates for reducing the annotation burden of multiple-object tracking (MOT)
pipelines. Unfortunately, they lack two key properties: objects are often split
into parts and are not consistently tracked over time. In fact,
state-of-the-art models achieve pixel-level accuracy and temporal consistency
by relying on supervised object detection with additional ID labels for the
association through time. This paper proposes a video object-centric model for
MOT. It consists of an index-merge module that adapts the object-centric slots
into detection outputs and an object memory module that builds complete object
prototypes to handle occlusions. Benefited from object-centric learning, we
only require sparse detection labels (0%-6.25%) for object localization and
feature binding. Relying on our self-supervised
Expectation-Maximization-inspired loss for object association, our approach
requires no ID labels. Our experiments significantly narrow the gap between the
existing object-centric model and the fully supervised state-of-the-art and
outperform several unsupervised trackers.Comment: ICCV 2023 camera-ready versio
Search for dark matter produced in association with bottom or top quarks in √s = 13 TeV pp collisions with the ATLAS detector
A search for weakly interacting massive particle dark matter produced in association with bottom or top quarks is presented. Final states containing third-generation quarks and miss- ing transverse momentum are considered. The analysis uses 36.1 fb−1 of proton–proton collision data recorded by the ATLAS experiment at √s = 13 TeV in 2015 and 2016. No significant excess of events above the estimated backgrounds is observed. The results are in- terpreted in the framework of simplified models of spin-0 dark-matter mediators. For colour- neutral spin-0 mediators produced in association with top quarks and decaying into a pair of dark-matter particles, mediator masses below 50 GeV are excluded assuming a dark-matter candidate mass of 1 GeV and unitary couplings. For scalar and pseudoscalar mediators produced in association with bottom quarks, the search sets limits on the production cross- section of 300 times the predicted rate for mediators with masses between 10 and 50 GeV and assuming a dark-matter mass of 1 GeV and unitary coupling. Constraints on colour- charged scalar simplified models are also presented. Assuming a dark-matter particle mass of 35 GeV, mediator particles with mass below 1.1 TeV are excluded for couplings yielding a dark-matter relic density consistent with measurements
- …